Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to the reusable tox workflow #1102

Merged
merged 2 commits into from
Nov 7, 2024
Merged

Migrate to the reusable tox workflow #1102

merged 2 commits into from
Nov 7, 2024

Conversation

kurtmckee
Copy link
Member

@kurtmckee kurtmckee commented Nov 6, 2024

This PR migrates the SDK to the reusable tox workflow.

Metrics Before this PR This PR (cache miss) This PR (cache hit)
Total duration 2m 0s 2m 5s 1m 46
Total run time 7m 51s 5m 42s 4m 9s

📚 Documentation preview 📚: https://globus-sdk-python--1102.org.readthedocs.build/en/1102/

@kurtmckee kurtmckee added the no-news-is-good-news This change does not require a news file label Nov 6, 2024
@kurtmckee kurtmckee self-assigned this Nov 6, 2024
@kurtmckee kurtmckee force-pushed the reusable-tox-workflow branch 7 times, most recently from 35590ba to 7459db7 Compare November 6, 2024 15:39
@kurtmckee kurtmckee force-pushed the reusable-tox-workflow branch from 7459db7 to 8ab0996 Compare November 6, 2024 15:39
@kurtmckee kurtmckee marked this pull request as ready for review November 6, 2024 15:43
-
- "requirements/*/*.txt"
- "pyproject.toml"
- "toxfile.py"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been wondering, over the past week, about whether or not tox-uv's faster venv building makes it unnecessary to cache the .tox dir contents. As long as the uv action's cache is populated, .tox/ can be quickly rebuilt.
One of the things I wonder is whether or not the balance between the two may, in fact, favor rebuilding over caching (since caching and hashing take some time).

I'm curious if you've given this any thought?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have. Caching the tarballs and wheels, instead of caching everything that was installed, hasn't previously been faster.

The numbers are borne out best on Windows, so I'll share from the feedparser logs, which tests the highest and lowest supported CPython versions (and which I recommend doing here, but didn't introduce in this PR).

Here's the timings reported by feedparser tests for Windows with a cache miss:

  py3.9-chardet: OK (45.76=setup[7.22]+cmd[38.55] seconds)
  py3.13-chardet: OK (41.80=setup[9.69]+cmd[32.11] seconds)
  congratulations :) (87.68 seconds)

and for a cache hit:

  py3.9-chardet: OK (42.22=setup[3.74]+cmd[38.48] seconds)
  py3.13-chardet: OK (31.66=setup[0.21]+cmd[31.45] seconds)
  congratulations :) (74.01 seconds)

(Note that the first tox environment always has the wheel build step counted in as a part of its setup.) Since the cmd times per tox environment are within ~0.5s of each other between the cache-miss and cache-hit executions, I'm more inclined to trust that the setup times aren't simply GitHub runner jitter.

So, my interpretation is that this is a win of ~13 seconds across 2 tox environments on Windows.

It took 1 second to look up the cache and miss, and then 5 seconds to upload the cache from the cache-miss job; it subsequently took 2 seconds to download the cache for the cache-hit job, which is an additional ~4 seconds won.

I have consistently found that it's faster to cache what's installed, rather than caching what needs to be installed. tox-uv makes environment creation and package installation fast, but I don't think it's fast enough.

You're welcome to try improving on this! It's mechanically trivial, but extremely time-consuming. Here's the steps:

  1. Create a branch off this project (or my own workflow repo)
  2. Point a second project with a "significant" test suite at the new branch
  3. Repeatedly push and force-push to the second project, possibly manually deleting the caches, and keep switching back to the workflow project to make and push changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this explanation 110% satisfactory. I'm probably not going to experiment with this at least within the next few days: my main question was about the comparison between time(cache miss + tox_uv setup + cache save) vs time(cache hit + tox_uv setup) and you've already provided numbers for that.

I am willing to accept some minor regressions in CI speeds if it gives us other improvements (e.g., workflow simplicity). In particular, I've been trying to track in the PRs as you've converted us over to the new workflow -- what exactly is being used for cache keys and is it "correct"?
The uv action cache carries all of the raw packages already (in the runner's homedir), so there's some interesting interplay there with the .tox dir.

Thanks for laying this all out for me!

Copy link
Member Author

@kurtmckee kurtmckee Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I see what you're referring to. This isn't using the uv GitHub action, so there's no side caching happening, and pip caching isn't enabled for the setup-python action.

For cache keys, here's the rule that I've generally been following:

Already-included files

These files are always included by the reusable workflow:

  • .python-identifiers

    (generated by the kurtmckee/detect-pythons action; ensures that the cache -- which contains symlinks to Python interpreter executables -- is invalidated if the Python versions change)

  • .workflow-config.json

    (ensures that changes to the requested configuration invalidates the cache)

  • tox.ini

    (ensures that changes to the tox configuration invalidates the cache)

Files you should use with cache-key-hash-files

In general, any files that contain tool configuration directives should be hashed for cache-busting.

  • pyproject.toml
  • mypy.ini
  • .flake8
  • .pre-commit-config.yaml
  • setup.cfg
  • requirements/*/*.txt
  • poetry.lock

If these files change, it can indicate that different dependencies should be installed, or that a tool like mypy should change how it's writing its own cache, or any number of other things that might make the workflow cache less useful.

@kurtmckee kurtmckee merged commit 0b981a8 into main Nov 7, 2024
7 checks passed
@kurtmckee kurtmckee deleted the reusable-tox-workflow branch November 7, 2024 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-news-is-good-news This change does not require a news file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants